skip to main content


Search for: All records

Creators/Authors contains: "Gentine, Pierre"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract

    Global storm-resolving models (GSRMs) have gained widespread interest because of the unprecedented detail with which they resolve the global climate. However, it remains difficult to quantify objective differences in how GSRMs resolve complex atmospheric formations. This lack of comprehensive tools for comparing model similarities is a problem in many disparate fields that involve simulation tools for complex data. To address this challenge we develop methods to estimate distributional distances based on both nonlinear dimensionality reduction and vector quantization. Our approach automatically learns physically meaningful notions of similarity from low-dimensional latent data representations that the different models produce. This enables an intercomparison of nine GSRMs based on their high-dimensional simulation data (2D vertical velocity snapshots) and reveals that only six are similar in their representation of atmospheric dynamics. Furthermore, we uncover signatures of the convective response to global warming in a fully unsupervised way. Our study provides a path toward evaluating future high-resolution simulation data more objectively.

     
    more » « less
  2. Projecting climate change is a generalization problem: We extrapolate the recent past using physical models across past, present, and future climates. Current climate models require representations of processes that occur at scales smaller than model grid size, which have been the main source of model projection uncertainty. Recent machine learning (ML) algorithms hold promise to improve such process representations but tend to extrapolate poorly to climate regimes that they were not trained on. To get the best of the physical and statistical worlds, we propose a framework, termed “climate-invariant” ML, incorporating knowledge of climate processes into ML algorithms, and show that it can maintain high offline accuracy across a wide range of climate conditions and configurations in three distinct atmospheric models. Our results suggest that explicitly incorporating physical knowledge into data-driven models of Earth system processes can improve their consistency, data efficiency, and generalizability across climate regimes.

     
    more » « less
    Free, publicly-accessible full text available February 7, 2025
  3. Abstract

    We provide a global, long-term carbon flux dataset of gross primary production and ecosystem respiration generated using meta-learning, calledMetaFlux. The idea behind meta-learning stems from the need to learn efficiently given sparse data by learning how to learn broad features across tasks to better infer other poorly sampled ones. Using meta-trained ensemble of deep models, we generate global carbon products on daily and monthly timescales at a 0.25-degree spatial resolution from 2001 to 2021, through a combination of reanalysis and remote-sensing products. Site-level validation finds that MetaFlux ensembles have lower validation error by 5–7% compared to their non-meta-trained counterparts. In addition, they are more robust to extreme observations, with 4–24% lower errors. We also checked for seasonality, interannual variability, and correlation to solar-induced fluorescence of the upscaled product and found that MetaFlux outperformed other machine-learning based carbon product, especially in the tropics and semi-arids by 10–40%. Overall, MetaFlux can be used to study a wide range of biogeochemical processes.

     
    more » « less
  4. Free, publicly-accessible full text available October 16, 2024
  5. Geostationary satellite reveals the asymmetrical impact of heatwaves on plant diurnal photosynthesis at the continental scale. 
    more » « less
    Free, publicly-accessible full text available August 4, 2024
  6. Accurate prediction of precipitation intensity is crucial for both human and natural systems, especially in a warming climate more prone to extreme precipitation. Yet, climate models fail to accurately predict precipitation intensity, particularly extremes. One missing piece of information in traditional climate model parameterizations is subgrid-scale cloud structure and organization, which affects precipitation intensity and stochasticity at coarse resolution. Here, using global storm-resolving simulations and machine learning, we show that, by implicitly learning subgrid organization, we can accurately predict precipitation variability and stochasticity with a low-dimensional set of latent variables. Using a neural network to parameterize coarse-grained precipitation, we find that the overall behavior of precipitation is reasonably predictable using large-scale quantities only; however, the neural network cannot predict the variability of precipitation ( R 2 ∼ 0.45) and underestimates precipitation extremes. The performance is significantly improved when the network is informed by our organization metric, correctly predicting precipitation extremes and spatial variability ( R 2 ∼ 0.9). The organization metric is implicitly learned by training the algorithm on a high-resolution precipitable water field, encoding the degree of subgrid organization. The organization metric shows large hysteresis, emphasizing the role of memory created by subgrid-scale structures. We demonstrate that this organization metric can be predicted as a simple memory process from information available at the previous time steps. These findings stress the role of organization and memory in accurate prediction of precipitation intensity and extremes and the necessity of parameterizing subgrid-scale convective organization in climate models to better project future changes of water cycle and extremes. 
    more » « less
    Free, publicly-accessible full text available May 16, 2024
  7. Free, publicly-accessible full text available May 1, 2024
  8. Abstract

    The Consistent Artificial Intelligence (AI)-based Soil Moisture (CASM) dataset is a global, consistent, and long-term, remote sensing soil moisture (SM) dataset created using machine learning. It is based on the NASA Soil Moisture Active Passive (SMAP) satellite mission SM data and is aimed at extrapolating SMAP-like quality SM back in time using previous satellite microwave platforms. CASM represents SM in the top soil layer, and it is defined on a global 25 km EASE-2 grid and for 2002–2020 with a 3-day temporal resolution. The seasonal cycle is removed for the neural network training to ensure its skill is targeted at predicting SM extremes. CASM comparison to 367 globalin-situSM monitoring sites shows a SMAP-like median correlation of 0.66. Additionally, the SM product uncertainty was assessed, and both aleatoric and epistemic uncertainties were estimated and included in the dataset. CASM dataset can be used to study a wide range of hydrological, carbon cycle, and energy processes since only a consistent long-term dataset allows assessing changes in water availability and water stress.

     
    more » « less
  9. Abstract The process of evapotranspiration transfers liquid water from vegetation and soil surfaces to the atmosphere, the so-called latent heat flux ( Q LE ), and modulates the Earth’s energy, water, and carbon cycle. Vegetation controls Q LE by regulating leaf stomata opening (surface resistance r s in the Big Leaf approach) and by altering surface roughness (aerodynamic resistance r a ). Estimating r s and r a across different vegetation types is a key challenge in predicting Q LE . We propose a hybrid approach that combines mechanistic modeling and machine learning for modeling Q LE . The hybrid model combines a feed-forward neural network which estimates the resistances from observations as intermediate variables and a mechanistic model in an end-to-end setting. In the hybrid modeling setup, we make use of the Penman–Monteith equation in conjunction with multi-year flux measurements across different forest and grassland sites from the FLUXNET database. This hybrid model setup is successful in predicting Q LE , however, this approach leads to equifinal solutions in terms of estimated physical parameters. We follow two different strategies to constrain the hybrid model and therefore control for the equifinality that arises when the two resistances are estimated simultaneously. One strategy is to impose an a priori constraint on r a based on mechanistic assumptions (theory-driven strategy), while the other strategy makes use of more observational data and adds a constraint in predicting r a through multi-task learning of both latent and sensible heat flux ( Q H ; data-driven strategy) together. Our results show that all hybrid models predict the target variables with a high degree of success, with R 2 = 0.82–0.89 for grasslands and R 2 = 0.70–0.80 for forest sites at the mean diurnal scale. The predicted r s and r a show strong physical consistency across the two regularized hybrid models, but are physically implausible in the under-constrained hybrid model. The hybrid models are robust in reproducing consistent results for energy fluxes and resistances across different scales (diurnal, seasonal, and interannual), reflecting their ability to learn the physical dependence of the target variables on the meteorological inputs. As a next step, we propose to test these heavily observation-informed parameterizations derived through hybrid modeling as a substitute for ad hoc formulations in Earth system models. 
    more » « less